Comlex Syntax: Building a Computational Lexicon

نویسندگان

  • Ralph Grishman
  • Catherine Macleod
  • Adam Meyers
چکیده

We des((tile tile design of Comlex Syntax, a co,nputa-tional lexicon providing detailed syntactic iuformation ff)r approximately 38,000 English headwords. We consider the types of errors which arise in creating such a lexicon, and how such errors can be measured and controlled. 1 Goal The goal of the (:omlex Syntax project is to create a moderately-broad-coverage lexicon recording the syntactic features of gnglist; words for purposes of cou> putational language analysis. This dictionary is being developed at New York University and is to he distributed by the Linguistic Data Consortimn, to be freely usable for both research and commercial purposes by members of the Consortium. In order to ineet the needs of a wide range of an~> lyzers, we have inchlded a rich set of syntactic features and haw~ aimed to characterize these Datures in a relatively theory-neutral way. In l)articnlar, the feature set is more detailed than those of the major commercial dictionaries, such ;us the Oxford Adwmced Learner's Dictionary (OALI)) [d] and the Longnum Dictionary of Contemporary English (LDOCE) [8], which haw~ I)een widely used as a source o[' lexical i,,for,,lal, ioil ill ];lll-guage analyzers. 1 In addil.ion, we have ahned to be irio,'e cOrrlpreheiisive ill capturhig featt, res (hi partic.u-]ar, stibcategorization ['eatures) than co,iI,llercial dic tlonaries. by Prof. Roger Mitten from the Oxford Adwn,ced Learner's Dictionary, and contains about 38,000 head forms, although some purely British terms have been omitted, loach entry is organized as a nested set of typed feature-vahle lists. We currently use a Lisp-like parenthesized list notation, although the lexicon couhl ITo facilii~ate the transition to COMLEX by currenl, users of these dictionaries, we have i)reparcd mappings froln COMI,EX classes to those of several other dictionaries. SOllie sauil)le dicticll,ary entries are shown ilt Figure 1. The first syml/ol gives the part of speech; a word with several parts of speech will have several dictionary entries, one for each part of speech. Each e,itry has all :orth foatilre, giving the base fO,'lfl of tile word, No,ins, verbs, and adjectiw~s with irregular Inorphology will liave featt,res for the irregular fo,.iris :plural, :past, :past-part, etc. Words which take con-,i)leirients will have a subcatego,'ization (:sube) ['eat,ire. For exaniple> the verb "ai)andon" eali occur with a IlOllri phrase followed by a prepositional phrase with tim preposition "to" (e.g., "1 abandoned hii,i to the linguists.") or with just a ,lOll,, phrase compleifient ("[ aballdone(l the shill."). Other syntactic features are recorded …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Comlex Syntax Project: The First Year

We describe the design of Comlex Syntax, a computational lexicon providing detailed syntactic information for approximately 38,000 EnglJish headwords. We consider the types of errors which arise in creagng such a lexicon, and how such errors can be measured and controlled.

متن کامل

The COMLEX Syntax Project

Developing more shareable resources to support natural language analysis will make it easier and cheaper to create new language processing applications and to support research in computational linguistics. One natural candidate for such a resource is a broad-coverage dictionary, since the work required to create such a dictionary is large but there is general agreement on at least some of the i...

متن کامل

No Escape from Syntax: Don't Try Morphological Analysis in the Privacy of Your Own Lexicon

Most contemporary theories of grammar assume a general organization in which elementary constituents are drawn from a place called the " Lexicon " for composition in the syntax, as in (1). (1) STUFF Syntax Lexicon Sound Meaning (Pure) Lexicon: place from which items are drawn for the syntax; the source of items used by the computational system of syntax While it is uncontroversial that our know...

متن کامل

Tagging as a Means of Refining and Extending Syntactic Classes

C, omlex Syntax is a moderately-broad-coverage English lexicon (with about 38,000 root forms) being developed at New York University under contract to the Linguistic Data Consortium; the first version of the lexicon was delivered in May 1994. The lexicon is available to members of the Linguistic Data Consortium for both research and commercial applications. It was developed for use in processin...

متن کامل

The Influence of Tagging on the Classification of Lexical Complements

A large corpus (about 100 MB of text) was selected and examples of 750 fl'equently occurring verbs were tagged with their compleinent (:lass as defined by a large computational syntactic dictionary, COMLEX Syntax. This tagging task led to the refinement of already existing classes and to the addition of classes that had previously not been defined. This has resulted in the enrichment and improv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994